{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "modular-concentration"
      },
      "outputs": [],
      "source": [
        "# Copyright 2021 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "insured-graduation"
      },
      "source": [
        "# Feedback or issues?\n",
        "\n",
        "For any feedback or questions, please open an [issue](https://github.com/googleapis/python-aiplatform/issues)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pregnant-going"
      },
      "source": [
        "# Vertex SDK for Python: AutoML Video Classification Example\n",
        "To use this Jupyter notebook, copy the notebook to a Google Cloud Notebooks instance with Tensorflow installed and open it. You can run each step, or cell, and see its results. To run a cell, use Shift+Enter. Jupyter automatically displays the return value of the last line in each cell. For more information about running notebooks in Google Cloud Notebook, see the [Google Cloud Notebook guide](https://cloud.google.com/vertex-ai/docs/general/notebooks).\n",
        "\n",
        "\n",
        "This notebook demonstrate how to create an AutoML Video Classification Model, with a Vertex AI video dataset, and how to serve the model for batch prediction. It will require you provide a bucket where the dataset will be stored.\n",
        "\n",
        "Note: you may incur charges for training, prediction, storage or usage of other GCP products in connection with testing this SDK."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pending-chamber"
      },
      "source": [
        "### Install Vertex SDK for Python\n",
        "\n",
        "\n",
        "After the SDK installation the kernel will be automatically restarted."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "coated-remark"
      },
      "outputs": [],
      "source": [
        "!pip3 uninstall -y google-cloud-aiplatform\n",
        "!pip3 install google-cloud-aiplatform\n",
        "import IPython\n",
        "\n",
        "app = IPython.Application.instance()\n",
        "app.kernel.do_shutdown(True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "incorporated-edgar"
      },
      "source": [
        "### Enter Your Project and GCS Bucket\n",
        "\n",
        "Enter your Project Id in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "hispanic-macedonia"
      },
      "outputs": [],
      "source": [
        "MY_PROJECT = \"YOUR PROJECT\"\n",
        "MY_STAGING_BUCKET = \"gs://YOUR BUCKET\"  # bucket should be in same region as ucaip"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "efovKMU5WW7u"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "\n",
        "if \"google.colab\" in sys.modules:\n",
        "    import os\n",
        "\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()\n",
        "    os.environ[\"GOOGLE_CLOUD_PROJECT\"] = MY_PROJECT"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "historical-consciousness"
      },
      "source": [
        "### Set Your Task Name, and GCS Prefix\n",
        "\n",
        "If you want to centeralize all input and output files under the gcs location."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "organizational-salad"
      },
      "outputs": [],
      "source": [
        "TASK_TYPE = \"mbsdk_automl-video-training\"\n",
        "PREDICTION_TYPE = \"classification\"\n",
        "MODEL_TYPE = \"CLOUD\"\n",
        "\n",
        "TASK_NAME = f\"{TASK_TYPE}_{PREDICTION_TYPE}\"\n",
        "BUCKET_NAME = MY_STAGING_BUCKET.split(\"gs://\")[1]\n",
        "GCS_PREFIX = TASK_NAME\n",
        "\n",
        "print(f\"Bucket Name:    {BUCKET_NAME}\")\n",
        "print(f\"Task Name:      {TASK_NAME}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "compact-engagement"
      },
      "source": [
        "# HMDB: a large human motion database\n",
        "We prepared some training data and prediction data for the demo using the [HMDB Dataset](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database).\n",
        "\n",
        "The HMDB Dataset is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/\n",
        "\n",
        "For more information about this dataset please visit: https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SPDHQoFRD-vM"
      },
      "outputs": [],
      "source": [
        "automl_video_demo_train_data = (\n",
        "    \"gs://automl-video-demo-data/hmdb_split1_5classes_all.csv\"\n",
        ")\n",
        "automl_video_demo_batch_prediction_data = (\n",
        "    \"gs://automl-video-demo-data/hmdb_split1_predict.jsonl\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "professional-bulletin"
      },
      "source": [
        "### Copy AutoML Video Demo Train Data for Creating Managed Dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "accurate-producer"
      },
      "outputs": [],
      "source": [
        "gcs_source_train = f\"gs://{BUCKET_NAME}/{TASK_NAME}/data/video_classification.csv\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sticky-casino"
      },
      "outputs": [],
      "source": [
        "!gsutil cp $automl_video_demo_train_data $gcs_source_train"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rough-alert"
      },
      "source": [
        "# Run AutoML Video Training with Managed Video Dataset"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "adaptive-slovakia"
      },
      "source": [
        "## Initialize Vertex SDK for Python\n",
        "\n",
        "Initialize the *client* for Vertex AI."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "figured-fellow"
      },
      "outputs": [],
      "source": [
        "from google.cloud import aiplatform\n",
        "\n",
        "aiplatform.init(project=MY_PROJECT, staging_bucket=MY_STAGING_BUCKET)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pleasant-holmes"
      },
      "source": [
        "## Create a Dataset on Vertex AI\n",
        "We will now create a Vertex AI video dataset using the previously prepared csv files. Choose one of the options below. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ln-8NdHjTfbH"
      },
      "source": [
        "Option 1: Using MBSDK VideoDataset class"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "uVBfL-0TTjNS"
      },
      "outputs": [],
      "source": [
        "dataset = aiplatform.VideoDataset.create(\n",
        "    display_name=f\"temp-{TASK_NAME}\",\n",
        "    gcs_source=gcs_source_train,\n",
        "    import_schema_uri=aiplatform.schema.dataset.ioformat.video.classification,\n",
        "    sync=False,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lXCA_nvHTp_I"
      },
      "source": [
        "Option 2: Using MBSDK Dataset class\n",
        "```\n",
        "dataset = aiplatform.Dataset.create(\n",
        "    display_name=f'temp-{TASK_NAME}',\n",
        "    metadata_schema_uri=aiplatform.schema.dataset.metadata.video,\n",
        "    gcs_source=gcs_source_train, \n",
        "    import_schema_uri=aiplatform.schema.dataset.ioformat.video.classification,\n",
        "    sync=False\n",
        ")\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3x4xuyIbVR_N"
      },
      "outputs": [],
      "source": [
        "dataset.wait()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mexican-spending"
      },
      "source": [
        "## Launch a Training Job and Create a Model on Vertex AI"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dynamic-piece"
      },
      "source": [
        "### Config a Training Job"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "continuous-circular"
      },
      "outputs": [],
      "source": [
        "job = aiplatform.AutoMLVideoTrainingJob(\n",
        "    display_name=f\"temp-{TASK_NAME}\",\n",
        "    prediction_type=PREDICTION_TYPE,\n",
        "    model_type=MODEL_TYPE,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "juvenile-parameter"
      },
      "source": [
        "### Run the Training Job"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "human-carrier"
      },
      "outputs": [],
      "source": [
        "model = job.run(\n",
        "    dataset=dataset,\n",
        "    training_fraction_split=0.8,\n",
        "    test_fraction_split=0.2,\n",
        "    model_display_name=f\"temp-{TASK_NAME}\",\n",
        "    sync=False,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "abstract-textbook"
      },
      "outputs": [],
      "source": [
        "model.wait()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "noted-usage"
      },
      "source": [
        "# Batch Prediction Job on the Model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ruled-smith"
      },
      "source": [
        "### Copy AutoML Video Demo Prediction Data for Creating Batch Prediction Job"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "polished-dispatch"
      },
      "outputs": [],
      "source": [
        "gcs_source_batch_prediction = (\n",
        "    f\"gs://{BUCKET_NAME}/{TASK_NAME}/data/video_classification_batch_prediction.jsonl\"\n",
        ")\n",
        "gcs_destination_prefix_batch_prediction = (\n",
        "    f\"gs://{BUCKET_NAME}/{TASK_NAME}/batch_prediction\"\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "objective-soldier"
      },
      "outputs": [],
      "source": [
        "!gsutil cp $automl_video_demo_batch_prediction_data $gcs_source_batch_prediction"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "piano-middle"
      },
      "outputs": [],
      "source": [
        "batch_predict_job = model.batch_predict(\n",
        "    job_display_name=f\"temp-{TASK_NAME}\",\n",
        "    gcs_source=gcs_source_batch_prediction,\n",
        "    gcs_destination_prefix=gcs_destination_prefix_batch_prediction,\n",
        "    sync=False,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "visible-scientist"
      },
      "outputs": [],
      "source": [
        "batch_predict_job.wait()\n",
        "bp_iter_outputs = batch_predict_job.iter_outputs()\n",
        "\n",
        "prediction_results = list()\n",
        "for blob in bp_iter_outputs:\n",
        "    if blob.name.split(\"/\")[-1].startswith(\"prediction\"):\n",
        "        prediction_results.append(blob.name)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "moving-geneva"
      },
      "outputs": [],
      "source": [
        "import json\n",
        "\n",
        "import tensorflow as tf\n",
        "\n",
        "tags = list()\n",
        "for prediction_result in prediction_results:\n",
        "    gfile_name = f\"gs://{bp_iter_outputs.bucket.name}/{prediction_result}\"\n",
        "    with tf.io.gfile.GFile(name=gfile_name, mode=\"r\") as gfile:\n",
        "        for line in gfile.readlines():\n",
        "            line = json.loads(line)\n",
        "            break\n",
        "\n",
        "print(line)"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "AI_Platform_(Unified)_SDK_AutoML_Video_Classification.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
